learning rate
Stability and Generalization of Push-Sum Based Decentralized Optimization over Directed Graphs
Liang, Yifei, Sun, Yan, Cao, Xiaochun, Shen, Li
Push-Sum-based decentralized learning enables optimization over directed communication networks, where information exchange may be asymmetric. While convergence properties of such methods are well understood, their finite-iteration stability and generalization behavior remain unclear due to structural bias induced by column-stochastic mixing and asymmetric error propagation. In this work, we develop a unified uniform-stability framework for the Stochastic Gradient Push (SGP) algorithm that captures the effect of directed topology. A key technical ingredient is an imbalance-aware consistency bound for Push-Sum, which controls consensus deviation through two quantities: the stationary distribution imbalance parameter $δ$ and the spectral gap $(1-λ)$ governing mixing speed. This decomposition enables us to disentangle statistical effects from topology-induced bias. We establish finite-iteration stability and optimization guarantees for both convex objectives and non-convex objectives satisfying the Polyak--Łojasiewicz condition. For convex problems, SGP attains excess generalization error of order $\tilde{\mathcal{O}}\!\left(\frac{1}{\sqrt{mn}}+\fracγ{δ(1-λ)}+γ\right)$ under step-size schedules, and we characterize the corresponding optimal early stopping time that minimizes this bound. For PŁ objectives, we obtain convex-like optimization and generalization rates with dominant dependence proportional to $κ\!\left(1+\frac{1}{δ(1-λ)}\right)$, revealing a multiplicative coupling between problem conditioning and directed communication topology. Our analysis clarifies when Push-Sum correction is necessary compared with standard decentralized SGD and quantifies how imbalance and mixing jointly shape the best attainable learning performance.
- Asia > China > Guangdong Province > Shenzhen (0.04)
- Europe > Montenegro (0.04)
- Oceania > Australia > New South Wales > Sydney (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- North America > United States > New Jersey > Mercer County > Princeton (0.04)
- Europe > France > Île-de-France > Paris > Paris (0.04)
Appendix A Proof of Theorem 2.1
We have the following lemma. Using the notation of Lemma A.1, we have E The third inequality uses the Lipschitz assumption of the loss function. Figure 10 supplements'Relation to disagreement ' at the end of Section 2. It shows an example where the behavior of inconsistency is different from disagreement. All the experiments were done using GPUs (A100 or older). The goal of the experiments reported in Section 3.1 was to find whether/how the predictiveness of The arrows indicate the direction of training becoming longer.
- North America > United States > Ohio (0.04)
- Asia > Singapore (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East > Jordan (0.04)
- Research Report > Experimental Study (0.93)
- Research Report > New Finding (0.92)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.92)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Maryland > Prince George's County > College Park (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- (2 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- North America > United States (0.14)
- Asia > Middle East > Jordan (0.04)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.85)
- North America > United States > Virginia (0.04)
- North America > United States > Maryland (0.04)